I  J 


RR  3/86 


Major  R.S.  Collyer 

THE  UNITED  STATES  N'T, OWL 
TECHNICAL.  INFORM 'T  ION  ~ 

IS  AUTHORISED  TC 
REPRODUCE  AND  SELL  THIS  1 


SELE- 
NOV  1  2 

^  A 


DT!C 

ELE  ' 

NOV  1  21986 


1st  PSYCHOLOGICAL 

RESEARCH 

UNIT 


APPROVED 


FOR  PUM.IC  R  Pi.  FASH 


OTIC  FILE  CORY 


Research  Report  3/86 


A  COMPARISON  OF  T'gjQ  TASK  RATING  SCALES  CF  PHYSICAL  DEMAND 


by 

Major  Robert  S.  Collyer 


© 


-  \  Commonwealth  of  Australia 
c  j  August  1986 


i  4.  nThiu  Difectorate  of  Psychology  publication  has  been  prepared  by  the 
1st  Psychological  Research  Unit  and  is  authorized  for  issue  by  DPSYCH-A. 


LTCQL  P.  N.  DRAKE-BROCKMAN 

Commanding  Officer 

1st  Psychological  Research  Unit 

NBH  3  -  44 

Northbourne  House 

TURNER  ACT  2601 


Paper  presented  at  the  21st  Annual  Conference  of  the  Australian 
Psychological  Society,  James  Cook  University,  Townsville,  QLD.  25-29  August, 
1986. 


I 

■  TABLE  GF  C.GNTEI\ITS 

Section  .  Page 

Introduction  1 

Method  3 

Samples  3 

Apparatus  5 

Procedure  6 

Results  and  Discussion  6 

Scale  Transformations  6 

Interrater  Reliability  6 

Scale  Intercorrelations  9 

Scale  Validities  11 

Implications  far  Future  Job  Analysis  12 

References  14 


Appendices 


16 


II 


LIST  OF  TABLES 


Table  -  Page 

1  Interrater  Reliability  —  Vehicle  Mechanics  7 

•  2  Interrater  Reliability  —  Driving  Trades  8 

3  PSE/PPE  Rank  Grder  Intercorrelaticns  --  Vehicle  Mechanics  9 

4  CIP-9/CIP-7  Rank  Order  Intercorrelations  —  Vehicle  Mechanics  9 

k  5  PSE/PPE  Rank  Order  Intercorrelations  —  Driving  Trades  ID 

5  CIP-9/CIP-7  Rank  Order  Intercorrelations  —  Driving  Trades  IQ 

7  Range  and  Means  for  Rank  Order  Intercorrelations  of  four  Task  10 

Factors  Across  Five  Corps 

► 


► 


► 


► 


► 


b « *  : 

H'--' 


* 


m 


/.**£■* 

to: 

'tV.*.  . 


m 


Ill 


* 


0* 


i  i* 


lOt 


« 


*0* 


ft 


Abstract 


Task  analysis  to  specify  the  physical  demands  of  work  is  expensive  in 
terms  of  manpower  skills,  equipment,  and  time.  Job  analysis  methods  to 
identify  tasks  that  should  be  subject  to  detailed  task  analysis  are  an 
important  part  of  any  strategy  to  specify  physical  demands  criteria  for  jobs. 
This  paper  reports  research  comparing  two  sets  of  task  inventory  rating  scales 
used  to  identify  important,  physically  demanding  tasks.  The  physical  demand 
scales  are  Perceived  Physical  Effort,  which  is  part  of  Physical  Abilities 
Analysis,  and  Physical  Strength  and  Endurance,  a  scale  developed  by  the  US  Air 
Force.  Two  scales  assessing  Consequences  of  Inadequate  Performance  were  also 
used.  The  scales  were  administered  to  large  samples  from  two  Australian  Army 
employment  groups.  Scale  reliabilities  and  task  rank  order  intercorrelations 
are  reported  and  implications  for  the  scales'  future  use  are  discussed. 


I 


I 

I 


The  findings  and  views  expressed  in  this  paper  are  the  result  of  the 
author's  research  studies  and  are  not  to  be  taken  as  the  official  opinion  of 
the  Department  of  Defence  (Army  Office). 


><%> 


'jC 


This  research  arose  from  the  Australian  Army's  need  to  soecify  the 
ohysical  demands  of  Army  jobs.  A  rigorous  methodology  was  required,  but  it 
had  to  take  into  consideration  the  need  for  (a)  economy  of  resources  in  the 
setting  of  criteria,  (b)  the  limited  time  available  for  the  testing  of 
oersonnel  as  part  of  the  process  of  allocating  recruits  to  trades,  and  (c)  the 
availability  of  physical  capacity  test(s)  related  to  job  criteria. 

A  literature  search  (Collyer  &  James,  1S85)  indicated  that  a  range  of 
approaches  have  been  employed  by  other  researchers,  but  as  Campion  (1883) 
points  out,  most  approaches  can  be  categorized  as:  (a)  measures  of  metabolic 
requirements,  (b)  measures  of  strength  requirements,  or  (c)  measures  of 
multiple  physical  abilities. 

Measurement  of  metabolic  requirements  involves  "physiological"  measures 
like  those  of  oxygen  uptake  or  heart  rate  during  task  performance.  Oxygen 
uptake  measures  can  be  made  by  collecting  and  analysing  expired  oxygen  using  a 
Douglas  8ag  (Astrand  &  Rodahl,  1977),  or  Kofranyi-Michaelis  gas  meter 
(Consolazio,  1971).  Measures  of  heart  rate  are  an  indirect  method  of 
credicting  oxygen  consumption  because  they  are  linearly  related  uithin  an 
individual  (Astrano  4  Rnyming,  1954).  Although  these  metacolic  metnccs  are 
the  most  accurate,  they  are  cumbersome  to  administer  and  require  expensive 
instrumentation  and  trained  personnel.  These  latter  three  characteristics  are 
undesirable  in  a  method  for  large  scale  implementation. 

Measures  of  strength  requirements  rely  on  assessments  of  weights  lifted, 
heights  to  which  they  are  lifted,  and  distances  carried.  A  variety  of  methods 
are  available  for  the  assessment  of  human  strength,  but  they  all  utilize  one 
of  three  muscle  contractions:  (a)  isometric,  (b)  isotonic,  or  (c)  isokinetic. 
Isometric  strength  is  relatively  easy  to  measure  accurately  using  practical 
standardized  methods,  and  some  researchers  (e.g.,  Chaffin,  1975)  recommend  its 
use,  although  care  must  be  exercised  when  doing  so  since  isometric  (also  known 
as  static)  strength  is  not  perfectly  correlated  with  the  other  strengths. 
Research  (Ayoub,  et  al.  1982;  Hogan,  Ogden,  Gebhardt,  &  Fleishman,  1979; 
Sharp,  et  al.  1980)  has  shown  that  approximately  90  percent  of  tasks  that  are 
physically  demanding  are  so  because  of  weights  lifted  and  carried.  The  focus 
of  these  findings  on  strength,  combined  with  the  practicalities  of  measuring 
isometric  strength,  offered  hope  of  a  practical,  economic  method  for  Army  use. 

Measures  of  multiple  physical  abilities  involve  assessment  of  such 
characteristics  as  strength,  stamina,  body  flexibility,  and  balance.  Physical 
Abilities  Analysis  (Fleishman  &  Hogan,  1978;  Myers,  Gebhardt,  &  Fleishman, 
1980a;  1980b)  is  a  major  example  of  this  technique.  It  involves  experienced 
job  personnel  making  judgments  about  task  demands  using  nine  specific  fitness 
scales  and  one  general  effort  scale.  The  scales  have  been  empirically  derived 
from  correlational  studies  and  require  little  training  or  equipment  to 
administer.  The  economy  of  using  these  rating  scales  was  attractive,  but 
selection  tests  to  assess  all  the  abilities  measured  by  Physical  Abilities 
Analysis  were  expected  to  be  more  complex  than  would  be  the  case  if  measures 
of  strength  alone  were  utilized. 

Each  method  has  advantages  and  disadvantages,  but  each  is  a  form  of  task 
analysis  that  is  designed  to  be  applied  to  specific  tasks.  Most  jobs  consist 
of  a  multitude  of  tasks,  and  most  analyses  are  conducted  to  develop  selection 
criteria  to  be  applied  to  job  types  rather  than  to  individual  jobs  or 
positions.  These  job  types  often  involve  hundreds  or  even  thousands  of  tasks. 
Regardless  of  which  form  of  task  analysis  is  employed,  job  analysis  to 
identify  tasks  which  should  be  analysed  in  detail  is  an  important  preliminary 


to  the  specification  of  onysical  demands  criteria,  althcugn  the  form  of  joe 
analysis  must  take  tne  type  of  tasK  analysis  into  account.  Tnis  paper 
oiscussas  a  method  of  joc  analysis  tnat  nas  Peen  useo  for  tnis  o^rpese. 

Methods  of  job  analysis  that  incluae  assessment  of  physical  demands  are 
availaole.  Examples  are  the  Position  Analysis  Questionnaire  (McCormick , 
Jeanneret  and  Fiecham,  1353),  Functional  Joo  Analysis  (Fine  ana  Uiley,  1371) 
and  Physical  Abilities  Analysis  (Fleisnman  and  Hogan,  1378).  Another  jeo 
analysis  method  that  provides  detailed  information  is  the  task  inventory 
survey  method  paired  uitn  tne  Comprehensive  Occupational  Data  Analysis 
Programs  analysis  package  (TI/CGuAP). 

The  Australian  Army  has  used  this  method  of  joo  analysis  since  1375. 
TI/CGDAP  was  considered  to  meet  two  requirements  of  the  research,  those  of 
rigorous  methodology  and  economy.  It  provides  an  economical  way  to  obtain  job 
task  data  from  large  numbers  of  workers  in  diverse  locations.  The 
information,  obtained  from  job  incumbents  and/or  supervisors,,  is  detailed, 
quantifiable  data  that  can  be  manipulated  by  CODAP  to  provide  a  wide  range  of 
group  joo  descriptions.  Because  the  data  are  quantifiable  they  can  be 
validated,  and  the  general  methodology  is  supported  by  a  large  body  of  sound 
researen  (see  Christa! ,  !374,  for  examples). 

Tne  TI/CCDAP  joo  analysis  uses  survey  questionnaires  administered  oy 
.mail.  The  main  components  of  these  questionnaires  are  a  task  inventory, 
listing  all  significant  tasks  in  the  employments  surveyed,  and  task  rating 
scales.  Experienced  supervisors  use  the  scales  to  rate  task  characteristics 
such  as  the  amount  of  training  emphasis  that  should  be  given  in  formal 
training,  or  the  consequences  of  inadequate  performance  of  a  task.  Task 
rating  scales  such  as  these  have  been  researched  and  used  extensively  overseas 
(e.g.,  Christal,  1974;  Jansen,  1982)  and  in  Australia  (e.g.,  limited 
distribution  occupational  analysis  reports  within  the  Department  of  Defence), 
but  there  was  no  Australian  research  using  scales  of  physical  demand.  Prior 
experience  had  shown  that  task  rating  scales  used  successfully  overseas  did 
not  necessarily  work  for  the  Australian  Army  (e.g.,  the  Training  Emphasis 
scale).  This  was  usually  because  of  the  diverse  nature  of  Australian 
employments.  It  was  decided  to  select  and  pilot  scales  that  seemed  to 
identify  physically  demanding  tasks  to  determine  their  appropriateness  to  the 
Australian  situation. 

The  literature  search  had  identified  two  main  streams  of  research  that 
offered  promise  of  suitable  task  rating  scales.  The  first  was  work 
originating  from  the  US  Air  Force  (who  were  the  original  developers  of  the 
TI/CUDAP  method).  The  second  was  Physical  Abilities  Analysis,  already 
mentioned  as  a  form  of  task  analysis.  Scales  from  these  research  areas  were 
selected. 

US  Air  force  research  reported  by  Gott  and  Alley  (1980)  showed  that  a  ten 
point  rating  scale  (0  to  9)  could  be  used  to  elicit  reliable  ratings  of 
physical  strength  and  endurance.  Reliability  was  assessed  using  the  Lindquist 
(1953)  method  of  intraclass  correlation.  This  reliability  is  in  itself  a  form 
of  convergent  validity,  and  other  validity  studies  were  underway,  so  the  scale 
was  selected  for  trial.  The  Physical  Strength  and  Endurance  (PSE)  scale  was 
modified  to  provide  metric  measures  of  weight  and  height,  and  to  combine  the 
two  lower  categories,  thus  making  a  nine-point  scale  anchored  at  each  point 
(see  Appendix  1).  This  modification  was  necessary  because  the  version  of 
CUOAP  on  line  in  the  Department  of  Oefence  was  unable  to  distinguish  between 
"X"  and  "0"  as  scale  ratings.  The  "X"  for  "No  knowledge  of  task  requirement" 
was  considered  necessary  to  the  research,  and  the  distinction  between  the  "0" 
and  "1"  scale  point  definitions  was  not  vital  to  the  research  objectives. 


Pnysicai  Aoilities  Analysis  comprises  tan  scales.  Mine  of  these  scales 
assess  specific  apiiities  (Static  Strengtn,  Dynamic  Strength,  explosive 
Strengtn,  Truck  Strength,  Starring,  Extent  flexibility,  Dross  Secy 
Coordination,  and  Equilibrium).  Whilst  these  scales  have  been  snown  to  nave 
good  reliability  and  validity  (Hogan,-  et  al.  1979;  Myers,  et  al.  1980a; 
1960b),  tnis  same  research  has  shewn  the  tenth  scale,  Perceived  Physical 
Effort  (PPE),  to  have  good  reiiaoility  and  validity  (i.e.,  it  correlates 
hignly  with  the  specific  scales  and  with  actual  metaoolic  costs/1  in 
identifying  physically  demanding  tasks.  Use  of  the  full  ten  scales  for  each 
Army  employment  group  was  seen  as  a  major  undertaking  and  one  which  would  oe 
oetter  avoided  if  possible.  Because  of  the  good  reports  on  t.ne  PPE  scale,  it 
was  selected  for  research  and  comparison  with  the  PSE  one. 

Because  it  is  not  a  reasonable  prooosition  to  use  tasks  which  are  not 
important  as  criterion  tasks  oy  wnicn  job  demands  are  specified,  a  measure  of 
the  consequences  of  inadequate  performance  of  tasks  was  incorporated  into  the 
research.  Two  Consequences  of  Inadequate  Performance  scales  were  selected. 
Bctn  scales  assess  the  same  task  characteristics,  but  one  (CIP-9)  is  a  nine- 
point  scaie  anchored  at  each  point  to  pair  with  the  nine-ooint  PSE  scale  and 
tne  other  (CIPt?)  is  a  seven-point  scaie  anenored  at  the  mid-ooint  and 
extremities  to  pair  with  tne  seven-point  PPE  scaie. 

A  comprehensive  job  analysis  as  proposed,  wniie  oeing  a  relatively 
economical'  way  to  obtain  job  data  from  a  large  number  of  workers,  is  still 
time  consuming  in  that  it  requires  experienced  workers  to  spend  between,  two 
and  three  hours  completing  the  questionnaire.  Determination  of  optimum  sample 
sizes  can  provide  substantial  savings  in  time  whilst  retaining  the  reliability 
and  validity  of  the  results.  This  research  incorporated  a  number  of  checks  to 
aid  in  determining  optimum  sample  size.  Samples  were  from  employments 
selected  to  provide  a  substantial  range  of  variables  representative  of  normal 
Army  employments,  and  were  expected  to  provide  a  rigorous  evaluation  by 
comparison  of  reliability  coefficients  and  rank  order  correlations  of  task 
mean  ratings  for  sub-samples.  Rank  order  correlations  were  used  because  sets 
of  critical,  demanding  tasks  were  to  be  selected  for  task  analysis.  The 
scales  were  not  required  to  specify  actual  task  demands  in  terms  of  weights, 
heights,  distances,  or  aerobic  demand,  even  though  one  scale  used  actual 
weights  and  heights  as  scale-point  definitions. 


This  paper  reports  research  undertaken  to  evaluate  the  two  pairs  of  task 
rating  scales,  PSE/CIP-9  and  PPE/CIP-7,  as  scales  that  can  be  used  to  identify 
important,  physically  demanding  tasks  for  detailed  task  analysis.  In  addition 
to  assessing  the  suitability  of  the  scales,  the  research  also  sought  to 
identify  optimum  sample  sizes  for  the  job  analysis.  Although  the  job  analysis 
method  used  was  TI/CODAP,  which  is  not  commonly  available,  the  rating  scales 
and  procedures  are  ones  that  can  be  used  with  other  analysis  packages. 


Method 

Samples 

Developing  samples  for  surveys  of  this  type  is  a  matter  of  experience  and 
judgment.  The  aim  is  to  select  samples  that  are  representative  of  the  range 
of  work  performed  and  the  conditions  under  which  it  is  performed.  To  evaluate 
a  level  of  confidence  in  the  results,  it  is  necessary  to  quantify  the 
agreement  between  raters.  The  Lindquist  (1953)  method  of  assessing  interrater 
agreement  was  used.  Experience  in  Australia  and  research  overseas  (e.g., 
Jansen,  1982)  has  shown  that,  depending  on  the  homogeneity  of  the  employment, 
a  suitable  level  of  interrater  agreement  can  be  obtained  using  30  to  50 


4 


0* 


I 

!  raters.  ~nen  me  e^oicynencs  are  Heterogeneous  (i.a.,  naue  a  large  diversity 

•  of  joo  types;,  mis  snculc  oe  interpreted  as  30  to  oO  raters  cer  ma  ior 

!  ,  -  .  /> 
ma* .sis  '.ar-.sc..e,  sc  agree  'ent  cap  ce  avaitacc*  *cr  suo— 5amu...e8  ip  race  u 

different  rating  policies  exist.  Because  many  Australian  Army  employments  are 
Heterogeneous  oy  comparison  to  US  forces'  ones  (where  the  selected  scales  were 
’  developed],  it  was  necessary  to  select  samples  that  would  enable  assessment  of 

the  scales  witn  Heterogeneous  employment  groups  ; 

■  The  sampling  strategy  employed  was  a  purposive  one.  It  was  designed  to  O 

Iootain  as  many  experienced  raters  as  possioie  up  to  at  least  413  in  eac.n 
identified  major  analysis  variable  (although  in  seme  cases  it  was  not  possible 
*  to  obtain  40  raters).  In  addition  to  tnese  considerations,  the  sample 

selected  was  larger  than  normally  considerea  necessary  for  good  quality 
information  so  that  the  data  could  still  be  used  if  unforeseen  variables  were 
later  identified,.  Two  employment  groups  were  selected  for  sampling,  the 
|  venicle  mechanics  and  tne  driving  trades. 

''  Venicle  mechanics.  A  target  sample  of  314  venicle  mechanics  was  selected 

for  survey.  This  employment  was  cnosen  because  it  is  a  technical  employment 
!  witn  a  wide  range  of  work  and  work  conditions,  Venicle  mechanics  do  ail  types 

cf  mecnanicai  repairs  and  maintenance  on  venicles  ranging  from  lignt  venicles  Q* 

(e.g.,  staff  cars  and  motor  cycles)  to  heavy  tanks,  tank  transporters,  earth 
movers,  and  oocksice  cranes.  There  are  two  main  types  of  repair  work,  oase 
repair,  where  major  and  minor  repairs  are  conducted  on  all  types  of  vehicles  \ 

in  base  workshops,  and  field  repair,  where  major  or  minor  repairs  are  3 

conducted  on  all  types  of  vehicles  using  equipment  at  hand  in  the  field  or  in 
mobile  workshops.  Field  repair  work  is  subject  to  the  vagaries  of  both  weather 
and  combat  movement.  These  two  types  of  repair  work  provide  two  major 
variables,  with  base  repair  units  being  generalized  as  Won  Field  Force  Command 
units  and  field  repair  units  as  Field  Force  Command  ones.  Previous 
occupational  analysis  studies  have  shown  this  employment  group  to  be 
homogeneous  with  respect  to  tasks  performed  (Healy,  1SB4)  but  the  varying 
equipment  and  work  conditions  were  expected  to  provide  more  variability  for  •», 

this  type  of  study. 

The  sample  included  all  vehicle  mechanic  Sergeants,  Staff  Sergeants  and  ; 

Warrant  officers,  and  some  experienced  Corporals.  The  two  pairs  of  survey  « 

scales  were  allocated  randomly  among  the  sample.  The  returned  questionnaires  J 

(the  actual  sample)  represented  67.6  percent  of  the  intended  sample,  and  there 
were  145  PSE/CIP-S  questionnaires  and  130  PPE/CIP-7  ones.  *  j 


Driving  trades.  From  a  population  of  1S51  drivers,  S50  were  selected  for  < 

survey  (the  intended  sample).  The  driver's  employment  group  includes  operators 
of  specialist  vehicles  and  transport  supervisors,  as  well  as  drivers  of  trucks 
and  staff  cars  across  a  number  of  Corps.  In  some  Corps  there  were  few 
drivers,  in  others  many.  Orivers  from  five  Corps  (Artillery,  Infantry, 

Signals,  Engineers  and  Transport)  were  selected  for  sampling  because  they  were 
considered  to  be  representative  of  drivers'  work.  This  was  seen  as  a 

heterogeneous  group  and  one  where  it  was  known  that  some  drivers  had 
difficulty  in  coping  with  the  physical  demands  of  some  tasks. 

The  sample  was  selected  to  obtain  representation  of  trade  and  Corps  and 
also  of  unit  type  (i.e..  Field  Force  Command  or  Won  Field  Force  Command). 

Corps  and  unit  type  were  the  major  variables.  The  two  pairs  of  rating  scales 

were  randomly  allocated  among  the  sample.  A  total  of  365  PSE/CIP-9 
questionnaires  and  349  PPE/CIP-7  ones  were  processed  (the  actual  sample). 

These  represented  77.2  percent  and  73.2  percent  of  the  target  sample 
respectively,  and  36.6  percent  of  the  Driving  Trades  population. 


% 


-  5  - 


-anei  raters 


cars .  because  joo  ana.vsie  surveys  are  time  consuming  to 
a  t3K e  respondents  from  t~eir  jctk  ,  cr.ecKS  were  inccrocra'cec  into 


aominis: 


resources.  Tne  task-  inventories  uere  developed  ay  panels  of  experienced 
workers  (oescricea  in  the  next  section)  and  one  cneck  was  to  arrange  for  tnese 
panel  memoers  to  provide  tasK  ratings  using  tne  PSE  scale  for  botn  employment 
groups,  and  tne  CIP-3  scale  for  one  group.  It  was  not  possioie  to  test  tne 
PPE/CIP-7  scales  in  t.nis  way  oecause  of  a  shortage  of  tine  at  tne  completion 
of  tne  inventory  development  panels.  These  panel  memoers,  uno  were  consicered 
to  oe  Kncwiecgauie  raters,  had  just  composted  rcur  to  rive  pays  analysing  tne 
employments  and  their  ratings  were  to  oe  compared  to  the  major  sample  ratings 
Detained  later  in  the  research.  These  ratings,  for  the  vehicle  mechanics, 
were  also  compared  to  the  panel  members'  own  ratings  (using  tne  same  scales) 
ootainea  curing  the  major  survey  (test-retest  reliability).  Seven  panel 
members  provided  ratings  on  PSE  and  CIP-9  for  the  vehicle  mec.nanics 
questionnaire  doth  as  experienced  panel  raters  ana  as  raters  during  the  main 
survey.  Fourteen  panel  members  provided  ratings  as  experienced  panel  raters 
for  the  driving  trades  questionnaire. 


purvey  questionnaires,  rcur  survey  Questionnaires  were  constructed,  with 
one  task  inventory  and  set  of  background  information  questions  ana  two  pairs 
of  task  rating  scales  per  employment  group.  Previous  job  analyses  had 
provided  joo  task  data  which  were  held  as  part  of  the  Military  Employments 
Data  Bank.  Existing  task  inventories  from  this  data  bank  were  used  as  the 
basis  for  experienced  panels.  Panel  members,  representing  different  areas  of 
the  employment,  updated  and  restructured  the  inventories  under  the  guidance  of 
a  trained  inventory  developer.  Inventory  development  procedures  were  ndrmal 
Army  occupational  analysis  ones  as  detailed  in  Standing  Operating  Preeedures 
for  the  Military  Employments  Research  and  Information  Team  (1962).  Equivalent 
results  can  be  obtained  by  following  the  procedures  contained  in  Archer  and 
Frucnter  (1963).  background  information  questions,  survey  instructions,  and 
rating  scales  were  added  later.  The  final  questionnaire  for  the  vehicle 
mechanics  contained  501  tasks  and  15  background  information  questions.  Tne 
driving  trades  questionnaire  contained  349  tasks  and  17  background  information 
questions. 

Rating  scales.  Two  pairs  of  rating  scales  were  employed.  Physical 
Strength  and  Endurance  paired  with  the  nine-point  Consequences  of  Inadequate 
Performance  scale,  and  Perceived  Physical  Effort  paired  with  the  seven-point 
Consequences  of  Inadequate  Performance  scale. 

Physical  Strength  and  Endurance  (PSE)  is  defined  as  involving  sigr'^icant 
use  of  the  "large"  muscle  groups  in  the  arms,  back,  or  legs.  These  include 
requirements  for  lifting,  lowering,  or  carrying  heavy  or  cumbersome  objects, 
pushing  or  pulling,  torquing,  or  any  other  demand  for  frequent  or  continuous 
exertion  or  muscular  effort.  Raters  were  told  to  make  their  ratings  on  the 
basis  of  (a)  the  most  demanding  aspects  of  each  task,  (b)  the  level  of  demand 
placed  on  a  single  individual  performing  the  task,  and  (c)  the  level  of  demand 
required  by  the  complete  task  from  start  to  finish  (Appendix  1-2).  The  scale 
has  nine  points,  with  anchor  points  described  in  terms  of  weights,  heights,  or 
equivalent  muscular  effort.  Raters  are  given  the  opportunity  to  inoicate  with 
an  "X"  if  they  feel  they  do  not  know  enough  about  the  task  requirements  to 
provide  a  rating. 


-erceived  -hysical  effort  (PPl,1  is  oer'ineo  as  me  decree  of  pnvsicai 
exertion  experienced  in  performing  a  single  tasx  cr  series  of  ras^s.  me 
scale  -as  seven  seines  rim  example  tesxs  ac  me  Imer  e~r,  -'ear  me  'ircie, 
ana  rewards  t.ne  upper  end  (Appendix  2).  These  example  tasks  are  lecateo  cr. 
tne  scale  according  to  tne  relative  metabolic  cost  of  their  performance, 
raters  were  askec  tc  rate  now  much  much  effort  it  takes  to  perform  eacn  task. 

Consequences  of  Inadequate  Performance  ; C I P )  is  a  measure  of  tne 
seriousness  of  orobaoie  consequences  resulting  from  a  task  not  ceing  _ermr~ec 
adequately.  It  is  defined  in  terms  of  prccatie  injury  or  death,  *  allure  to 
accomplish  a  critical  mission,  uasteo  supplies,  camageo  equipment,  arc  wasted 
nours  of  work.  Two  CIP  scaies  were  selected.  iacn  scaia  measures  tne  same 
cnaracteristic.  Tne*  differences  between  them  are  tnat  one  is  a  nine-point 
scale  v C IP— 3 )  anenored  at  each  ooint,  to  pair  uitn  tne  nine-point  PSE  scale 
i„see  Apoenqix  3),  tne  other  is  a  seven-point  scale  (CIP-7)  ancncreo  at  the 
mid-point  and  the  extremities  to  pair  with  tha  seven-point  Pb£  scale  (see 
Appendix  a).  The  seven-point  scale  uses  example  tasks  to  nelp  define  scaie- 
point  ancncrs. 


Survey  agministration.  Tne  surveys  were  conaucteG  py  maii  using  ncminatec 
officers  within  units  to  aaminister  them.  Detailed  instructions  for 
completing  the  surveys  were  included  in  the  questionnaire  booklets  and  were 
supplemented  ay  administrative  instructions  to  the  administering  officers. 
Haters  were  instructed  to  rate  all  tasks  using  their  total  military 
experience,  not  just  the  situation  in  their  present  unit.  After  rating  the 
tasks,  respondents  completed  the  background  information  questions  and  sealed 
the  booklet  in  an  envelope  addressed  to  the  project  team.  This  was  handed  to 
the  administering  officer  for  return  to  Canberra.  There  was  no  requirement 
for  booklets  to  be  completed  in  the  presence  of  the  administering  officer. 
These  administrative  arrangements  were  normal  for  Army  occupational  analysis 
surveys  and  most  respondents  had  probably  completed  a  survey  booklet  at  some 
time  in  their  career. 


Results  and  Discussion 

ocale  Transformations 

Since  each  survey  respondent  can  have  a  personal  assessment  of  what  is 
"average"  and  rate  according  to  that  assessment  when  using  the  CIP  scales, 
CIP-9  and  CIP-7  were  treated  as  relative  scales.  The  PPE  scale  was  also 

treated  as  a  relative  scale  since  it  was  juogeo  that  most  raters  would  fix  a 
frame  of  reference  of  "effort"  within  which  to  rate  tasks.  Ratings  for  these 
three  scales  were  standardized  to  a  mean  of  five  ana  a  standard  deviation  of 

one.  This  is  a  standard  option  available  as  part  of  the  CODAP  analysis 

system.  because  the  wording  of  the  anchor  points  for  the  PSE  scale  made 

reference  to  specific  weights  and  heights,  it  was  judged  to  be  an  absolute 
scale  and  therefore  unsuited  for  standardization. 


Interrater  Reliaoility. 

Interrater  reliability  statistics  used  were  based  on  the  Lindquist  (1953) 
method  of  intraclass  correlation  as  applied  using  the  CDDAP  REXALL  program. 
Two  indices  of  reliability  are  normally  reported  by  REXALL,  and  a  third  has 
been  calculated.  The  first,  R^,  is  the  single  rater  reliability,  which 


apcrox imates  the  average  of  ail  possible  aairwise  correlations.  The  second, 
Rk(<»  is  the  reiiadility  for  a  sample  of  «  raters,  union  is  tne  expected 
correlation  between  t'-e  set  of  coserved  sample  ta=K  means  anp  z~s  task  ~eans 
df  an  hypothetical  equivalent  sample.  If  calculated  and  Rk(<  values  meet 
or  exceed  .20  and  .90  respectively?  they  are  interpreted  as  meaning  tnat 
sufficient  rater  agreement  exists  to  produce  staole  estimates  of  task  mean 
values  (see  Jansen  (1962)  and  Goody  (1976)  for  more  detailed  discussions  on 
the  use  of  REXALL).  Sample  size  is  also  an  important  consideration  in 
assessing  interrater  reliability,  especially  in  evaluating  causes  of  poor 
agreement.  Experience  has  shown  that  between  30  and  40  raters  per  analysis 
category  can  be  expected  to  give  reliable  results.  because  sub-sample  sizes 
vary  greatly  in  this  research,  the  Spearman-Brown  prophecy  formula,  wnich 
algebraically  describes  the  relationship  between  sample  size  and  interrater 
reliability,  was  used  to  calculate  a  standard  R,  for  a  sample  size  of  k=5G 
raters. 

Interrater  agreement  results  are  summarized  for  the  vehicle  mechanics 
sample  in  Table  1  anc  for  the  driving  traoes  sample  in  Table  2.  Comparison  of 
R^.  and  Rk[<  results  against  the  desirable  criteria  of  .20  and  .90  (Jansen, 
1382)  respectively  inoicate  that  results  for  tne  total  rating  set  an  ail 
scales  were  very  good  for  the  vehicle  mechanics,  snowing  tne  close  agreement 
oetween  supervisors  as  to  which  tasks  were  important  and  wnicn  ones  were 
physically  demanding.  The  reliability  results  for  the  driving  trades  were 
also  satisfactory,  though  the  R. ^  values  tended  to  be  lower  than  those  for  the 
vehicle  mechanics.  These  figures  for  the  drivers  reflect  the  more  diverse 
nature  of  that  employment  group,  but  all  values  are  satisfactory.  It  is  also 
noteworthy  that  the  and  Rj,  values  for  the  panel  raters  of  both  surveys 
showed  good  agreement,  as  did* the  values  calculated  for  the  vehicle  mechanic 
panel  members'  ratings  from  the  actual  survey.  It  was  not  possible  to  collect 
survev  data  for  the  driving  trades  panel  members  during  the  actual  survey. 


Table  1 


Interrater  Reliability  —  Vehicle  fOechanics 


CATEGORY 

SCALE 

R. . 

R, , 

C 

k 

11 

kk 

50  50 

All  raters 

PSE 

.52 

.99 

.98 

134 

CIP-9 

.41 

.99 

.97 

134 

PPE 

.62 

.99 

.99 

122 

CIP-7 

.45 

.99 

.93 

122 

FF  Comd  raters 

PSE 

.53 

.98 

.98 

50 

CIP-9 

.42 

.97 

.97 

50 

PPE 

.65 

.99 

.99 

42 

CIP-7 

.47 

.97 

.96 

42 

Non  FF  Comd  raters 

PSE 

.52 

.99 

.98 

83 

CIP-9 

.40 

.98 

.97 

84 

PPE 

.60 

.99 

.99 

SO 

CIP-7 

.44 

.98 

.97 

30 

Experienced  panel 

PSE 

.56 

.90 

.98 

7 

CIP-9 

.46 

.85 

.98 

7 

Panel/ survey 

PSE 

.67 

.93 

.99 

7 

CIP-9 

.48 

.86 

.98 

7 

y 

V 

-  8  - 

$ 

Tacle 

/ 

® 

Interrater  Reli 

abilit 

v  —  Orivina 

Trades 

* 

CATEGORY 

SCALES 

H11 

ft,  , 
kk 

R5G  50 

k 

| 

All  raters 

PSE 

.36 

.  39 

.97 

261 

© 

CIP-9 

.26 

.39 

.95 

260 

PPE 

.42 

.99 

.97 

260 

CIP-7 

.27 

.99 

.35 

260 

I'Y 

FF  Come  raters 

PSE 

.  35 

.39 

.38 

138 

« . 

,« 

CIP-7 

.21 

.37 

.93 

144 

© 

I1 

PPE 

.40 

.99 

.97 

134 

t* 

CIP-7 

.27 

.98 

.95 

136 

y 

‘.V 

Non  FF  Ccmd  raters 

PSE 

.37 

.99 

.97 

124 

y 

y 

CIP-3 

.28 

.98 

.95 

126 

PPE 

•  39 

- 

•  ^  t 

125 

© 

y 

CIP-7  * 

.27 

.98 

,  3b 

1 25 

— 

Corps  1  raters 

PSE 

.38 

.39 

.87 

145 

y 

v 

-> 

CIP-S 

.28 

.98 

.95 

146 

y 

PPE 

.45 

.99 

.98 

144 

'*!■ 

« 

CIP-7 

.28 

.98 

.95 

144 

Corps  2  raters 

PSE 

.39 

.96 

'  .97 

34 

CIP-9 

.26 

.92 

.95 

34 

«- 

PPE 

.39 

.96 

.97 

34 

\ 

!? 

CIP-7 

.21 

.91 

.93 

36 

$ 

•> 

41 

© 

' 

Corps  3  raters 

PSE 

.35 

.96 

.97 

CIP-9 

.22 

.91 

.93 

38 

i 

PPE 

.38 

.96 

.37 

38 

, 

;y 

v 

'  f-1 

ft 

* 

CIP-7 

.23 

.92 

.94 

39 

's 

Corps  4  raters 

PSE 

.31 

.92 

.96 

26 

© 

S 

t 

CIP-9 

.25 

.89 

.94 

26 

PPE 

.40 

.96 

.97 

32 

CIP-7 

.33 

.94 

.96 

33 

*« 

<» 

;> 

Corps  5  raters 

PSE 

.33 

.89 

.96 

17 

V' 

CIP-9 

.28 

.88 

.95 

18 

PPE 

.36 

.89 

.97 

12 

V 

CIP-7 

.26 

.82 

.95 

13 

*. 

»* 

Experienced  panel 

PSE 

.61 

.94 

.99 

14 

1 

*; 

r 

.* 

© 

k'K 
« ■. 

The  interrater  reliability  statistics  indicate 

that  personnel  experienced 

in  the  employments  surveyed  can 

agree 

closely  on  which  tasks  are  physically 

;c 

% 

■ji  _ 

demanding  and  which 

ones  are  not.  They  can  also 

agree 

which  tasks  are 

V 

important  and  which  ones  are  not 

when  considered  in 

relation  to  unit  mission. 

& 

safety ,  or  loss/damage  to  stores. 

This  agreement  holds  for  all  raters  and  for 

& 

? 

sub-groups  by  Corps  or 

type  of  Command 

for  each  of 

the  four 

scales.  Use  of 

the  Spearman-Brown  prophecy  formula  indicates  that, 

should 

an  occupational 

*i» 

»> 

M 

-  9  - 


analysis  survey  be  conducted  using  any  of  these  scales,  reliable  results  can 
oe  expected  oy  surveying  approximately  50  experienced  trade  personnel.  Tnis 
cemcnstrates  a  ccnsideracle  saving  in  staff  time,  and  in  joo  incumbent  time 
required  to  complete  survey  questionnaires.  Time  per  person  to  complete  tnese 
questionnaires  was  typically  between  two  and  three  nours.  The  high  interrater 
reliability  results  from  the  panel  members  ratings  suggest  that  this  set  of 
task  means  may  also  be  adequate  for  identifying  tasks  for  task  analysis.  The 
reliability  results  for  k=5U  raters  are  consistent  with  previous  reports 
(e.g.,  Jansen,  1962). 

Scale  Intercorrelations 

Part  of  the  survey  aim  was  to  compare  how  the  pairs  of  rating  scales 
selected  important  physically  demanoiny  tasks  for  task  analysis.  To  make  tnis 
comparison,  the  rank  ordering  of  the  tasks  based  on  task  mean  ratings  was 
assessed  for  PSE  versus  PPE  and  CIP-9  versus  CIP-7.  Tables  3  to  5  report  tne 
rank  order  intercorrelations  far  these  comparisons.  For  the  drivers, 
correlations  for  the  ratings  across  the  five  Corps  are  summarized  in  Table  7. 
All  correlations  are  significant  (P<.01). 


Table  3 

P5E/PPE  Rank  Order  Intercorrelations  —  Vehicle  Mechanics 


2  3  4 

5 

6 

7 

8 

1 .  PSE  all  raters 

.99  1.0  .94 

.95 

.96 

.96 

.95 

2.  PSE  FF  Comd 

.98  .94 

.96 

.94 

.94 

.93 

3.  PSE  Non  FF  Comd 

.93 

.94 

.96 

.96 

.96 

4.  PSE  panel  raters 

-- 

.95 

.90 

.90 

.89 

5.  PSE  panel/survey 

— 

.90 

.90 

.89 

6.  PPE  all  raters 

-- 

1  .0 

1  .0 

7.  PPE  FF  Comd 

— 

1  .0 

6.  PPt  Non  FF  Comd 


Table  A 

CIP-9/CIP-7  Rank  Order  Intercorrelations  —  Vehicle  Mechanics 


Ret. 

2  3  4 

5 

6 

7 

8 

■ 

1 .  CIP-9  all  raters 

.98  .99  .75 

.89 

.97 

.95 

.97 

B 

2.  CIP-9  FF  Comd 

.94  .81 

.91 

.97 

.96 

.95 

u 

3.  CIP-9  Non  FF  Comd 

—  .70 

.85 

.95 

.92 

.96 

H 

4.  CIP-9  panel  raters 

— 

.78 

.78 

.82 

.76 

10 

5.  CIP-9  panel/ survey 

— 

.88 

.86 

.87 

6.  CIP-7  all  raters 

— 

.98 

.99 

■ 

7.  CIP-7  FF  Comd 

— 

.95 

8.  CIP-7  Non  FF  Comd 


-  10  - 


© 


Table  5 

P5E/PPE  Rank  Order  Intercorrelations  —  Driving  Traces 


-2  3  4 

5 

6 

7 

PSE 

all  raters 

.82  1.0  .83 

.93 

.92 

.94 

PSE 

FF  Ccmd 

.99  .82 

.92 

.92 

.93 

PSE 

Non  FF  Comd 

.83 

.94 

.93 

.94 

PSE 

panel  raters 

— 

.72 

.71 

.74 

PPE 

all  raters 

— 

.99 

.  99 

PPE 

FF  Comd 

— 

.98 

PPE 

Non  FF  Comd 

-- 

Table  6 

CIP-9/CIP-7  Rank  Order  Intercorrelations  —  Driving  Trabes 


2  3  4 

5 

5 

1. 

CIP-9 

all  raters 

.99  .99  .97 

.97 

.97 

2. 

CIP-9 

FF  Comd 

.96  .97 

.97 

.95 

3. 

CIP-9 

Non  FF  Comd 

—  .97 

.96 

.96 

4. 

CIP-7 

all  raters 

-- 

.99 

.99 

5. 

CIP-7 

FF  Comd 

-- 

.97 

6. 

CIP-7 

Non  FF  Comd 

— 

<D 


© 


© 


© 


Table  7 

Range  and  Weans  for  Rank  Order  Intercorrelations 
of  Four  Task  Factors  Across  Five  Corps 


© 


miN 

mAX 

MEAN 

PSE 

.78 

.96 

.86 

PPE 

.81 

.94 

.87 

CIP-9 

.79 

.93 

.86 

CIP-7 

.74 

.94 

.84 

The  scale  intercorrelations  show  that  the  experienced  raters  agreed 
closely  which  tasks  were  physically  demanding  and  which  ones  were  not,  whether 
they  used  the  PSE  scale  or  the  PPE  one.  High  agreement  was  also  evident  for 
the  CIP-9  and  CIP-7  scales.  These  results  suggest  that  essentially  the  same 
job  analysis  results  may  be  obtainable  regardless  of  which  pair  of  scales  are 
used.  This  is  discussed  in  more  detail  later  in  this  paper.  The  high 
intercorrelations  between  the  sub-samples  selected  by  major  variables  (i.e.. 
Corps  and  unit  type)  were  expected  (although  not  mandatory)  once  the  high 
interrater  reliability  statistics  had  been  calculated  for  the  sub-samples. 
The  high  rank  order  correlations  between  the  panel  members'  ratings  and  those 
of  the  main  survey  suggest  that,  under  certain  circumstances,  the  main  survey 
may  not  be  required.  These  circumstances  are  also  discussed  later. 


icale  Validities 

Although  a  high  level  of  agreement  exists  between  raters  using  both  pairs 
of  scales,  there  were  some  differences  in  the  way  the  scales  identified  tasks, 
and  in  order  to  recort  these  it  is. necessary  to  look  briefly  at  the  selection 
of  tasks  for  analysis,  and  some  results  of  tnat  task  analysis.  This  will  give 
an  indication  of  the  validity  of  the  physical  demands  scales  for  identifying 
tasks  for  task  analysis. 

Tasks  were  sorted  in  order  of  task  mean  ratings  and  categorized  using  the 
following  decision  rules:  Firstly,  those  tasks  with  a  mean  rating  greater  than 
the  grand  (overall)  mean  plus  one  standard  deviation  on  the  physical  demands 
scales  were  considered  as  potential  tasks  for  task  analysis;  secondly,  these 
tasks  were  categorized  using  the  CIP  mean  ratings  so  that  those  tasks  with 
means  greater  than  the  grand  mean  plus  one  standard  deviation  were  included  on 
a  top  priority  task  analysis  list,  and  so  on  for  lessor  CIP  categories. 

Structured  task  analysis  interviews  using  a  fcrmat  modified  from  Aycub, 
at  al.  (1382)  were  conducted  to  identify  precisely  what  mace  these  important 
lZ5Z'<5  physically  Pemandinc.  lb '"acts  were  ipsntifiec  anc  estimates  ac  weights 
lifted,  distances  adjects  were  moved,  and  forces  (e.q.,  torcues-,  oush.  Dull) 
applied  were  obtained.  Objects  were  also  weigned  and  technical  manuals 
checked  for  heights  and  torques.  Estimates  of  frequency  of  task  performance 
were  also  obtained.  Part  of  these  interviews  involved  seeking  to  identify 
physically  demanding  tasks  which  were  not  included  in  the  list  of  tasks  for 
task  analysis.  Although  a  few  were  proffered  in  the  interviews,  a  check 
invariably  showed  that  these  had  been  included  in  the  original  job  analysis 
and  had  not  been  on  the  task  analysis  list  because  mean  ratings  had  been  too 
lew.  No  new  tasks  with  significant  physical  demands  were  identified, 
indicating  that  the  job  analysis  using  both  these  physical  demands  scales  and 
this  method  of  selecting  tasks  had  been  quite  adequate. 

The  choice  of  physical  demands  scale  had  several  effects  on  the  set  of 
tasks  identified  by  the  task  selection  criteria.  If  the  PPE  scale  was  used, 
more  tasks  were  considered  to  be  physically  demanding  than  by  selecting  with 
the  PSE  scale.  But  the  PPE  scale  identified  all  significant  tasks  identified 
by  the  PSE  one.  Inspection  of  the  tasks  indicates  that  this  appears  to  occur 
because  raters  using  the  PSE  scale  identify  tasks  largely  on  the  basis  of  the 
weights  (and  possibly  the  height)  specified  in  the  scale  anchor  points.  The 
"OR  an  equivalent  demand  for  frequent  or  continuous  muscular  effort”  part  of 
the  definitions  appears  to  have  less  influence,  although  the  "stamina”  tasks 
(e.g.,  "drive  cross-country  in  convoy"  ,  which  is  mentally  and  physically 
demanding  but  requiring  more  stamina  than  strength),  did  appear  further  down 
the  ranked  list  of  physically  demanding  tasks.  An  alternative  explanation  may 
be  that  the  PSE  scale  discriminates  between  strength  and  stamina  better  than 
does  the  PPE  one,  and  that  the  stamina  tasks  really  are  less  physically 
demanding.  Assessment  of  metabolic  cost  in  task  performance  would  need  to  be 
conducted  to  determine  if  this  alternative  is  true,  but  Ayoub,  et  al.  (1902), 
using  the  PSE  scale,  Hogan,  et  al.  (1979),  using  the  PPE  scale,  and  Sharp,  et 
al.  (1980),  using  actual  metabolic  cost,  all  found  that  about  90  percent  of 
physically  demanding  tasks  are  so  because  weights  lifted  and  carried.  Also, 
the  PPE  scale  is  linked  to  metabolic  cost  via  its  development  and  the  example 
tasks  on  the  scale.  This  is  an  area  that  could  be  subject  to  more  research. 
In  practice,  there  is  little  problem  in  selecting  "weight"  tasks  from  the 
tasks  identified  by  these  job  analysis  procedures,  so  either  scale  would  have 
served  the  purpose. 


-  12  - 


A  further  factor  to  emerge  from  toe  tas*  analysis  interviews  is  that 
t.nose  interviewed  oiearly  orefarrso  tra  PSE  scale.  Possibly  this  was  oecause 
the  P5E  scale  makes  it  clear,  by  using  specific  weights  and  heights,  what  the 
rating  frame  of  reference  should  be;  whereas  the  PPE  scale  is  less  clearly 
defined  for  these  raters.  This  would  suggest  more  face  and  content  validity 
for  tne  PSE  seals  than  for  tne  PPE  one. 

Implications  for  Future  Job  Analyses 


This  research  suggests  tnat  the  job  analysis  to  identify  important 
physically  demanding  tasks  can  be  done  satisfactorily  using  either  pairs  of 
scales.  Scale  reliabilities  are  good,  there  are  high  rank  order  correlations 
between  the  two  physical  demands  scales  and  the  two  importance  ones,  and  the 
task  analysis  failed  to  identify  any  significant  physically  Demanding  tasks 
not  identified  oy  tne  job  analysis  procedures. 

Although  these  job  analysis  methods  would  be  practical  regardless  of  the 
task  analysis  methods  used  (i.e.,  metabolic  cost,  strength,  or  multiple 
physical'  abilities),  jcG  selection  criteria  that  focus  on  strength  could  be 
justified  on  the  oasis  o*"  researcn  (already  sir  ad;  showing  the  dominance  of 
strength  requirements  in  performing  physicaii/  cemanding  tasks.  Use  of  suen 
criteria  offers  economy  of  resources  needed  in  selection  testing  and  in 
conducting  job  analysis.  Measurement  of  strength  is  far  simpler  than  the 
measurement  of  stamina,  and  job  analysis  could  focus  on  weights  lifted  and 
carried.  If  strength  oriented  task  analysis  procedures  are  selected  (and  thus 
concentrate  on  weights  lifted  and  carried),  then  the  PSE  scale  appears  to 
offer  advantages  over  the  PPE  one.  It  focuses  rater  attention  on  those 
characteristics  by  virtue  of  its  scale  point  definitions;  raters  felt  more 
comfortable  using  the  PSE  scale;  and  there  were  fewer  tasks  in  the  group 
selected  for  task  analysis  because  the  PSE  scale  tended  to  exclude  "stamina" 
type  tasks. 

Regardless  of  which  set  of  scales  are  selected  for  use,  this  research 
offers  some  very  significant  (and  cost  reducing)  implications  for  the  design 
of  the  job  analysis.  Reliability  statistics  from  this  research  shew  very  good 
results  not  only  for  large  samples,  but  also  for  quite  small  ones. 
Satisfactory  reliability  coefficients  were  obtained  for  all  scales  from  a 
sample  size  of  k=5Q  raters.  Since  two  to  three  hours  of  work  time  are  saved 
for  each  worker  not  required  to  complete  a  survey  questionnaire  booklet,  this 
can  mean  many  man-years  of  work  saved  by  applying  these  findings. 

The  high  correlation  between  the  panel  members'  ratings  and  those  of  the 
main  survey  on  the  PSE/CIP-9  scales,  and  the  high  reliability  results  for  the 
panel  ratings,  suggest  that  the  job  analysis  may  be  undertaken  by  using  the 
experienced  panel  to  develop  a  task  inventory  and  to  rate  the  tasks  using  the 
selected  scales.  (Although  the  PPE/CIP-7  scales  were  not  tested  in  this  way, 
their  high  correlations  with  PSE/CIP-9  suggest  they  would  provide  acceptable 
results. )  If  the  interrater  reliability  results  from  the  panel  members  is 
acceptable,  these  could  be  used  to  identify  tasks  for  task  analysis,  thus 
saving  the  resources  required  for  the  survey.  If  the  panel  ratings  do  not 
demonstrate  acceptable  reliability,  then  a  survey  of  approximately  50 
experienced  workers  could  be  conducted.  An  alternative  to  the  survey  may  be 
to  vary  the  criteria  used  for  selecting  tasks  for  task  analysis.  One  way  to 
do  this  would  be  to  accept  the  ratings  for  those  tasks  that  show  high 
rater  agreement,  and  select  more  apparantly  high  demand  (but  also  high  rater 
variance)  tasks  for  task  analysis.  Put  another  way,  the  job  analysis  is  to 
ensure  a  complete  coverage  of  the  employment  via  the  task  inventory.  The  task 
ratings  are  to  target  the  use  of  labour,  skill  and  equipment  intensive 


-  13  - 


© 


resources,  and  where  there  is  more  variance  in  the  task  ratings  than  is 
desiraoie,  some  compensation  can  be  made  by  subjecting  more  tasks  to  task 
analysis.  It  is  estimated  that  the  successful  application  of  the  procedures 
described  in  this  paragraph  would  save  about  ten  man-years  of  Army  project 
’ staff  work  if  they  were  to  be  applied  to  the  setting  of  physical  demands 
criteria  for  Army  jobs.  Many  man-years  of  worker  time  would  also  be  saved. 

© 


© 


© 


Of 


01 


& 


*> 


© 


15 


o 


Reference:; 


Arcner ,  U .  3.,  <4  Fruchter,  D.  A.  (1963.)  .  The  construction!  review,  arc! 

administration  of  Air  Force  job  inventories  (PRL-TDR-62-21  ,  AD-426  755). 

Lackland  AF8,  TX:  Personnel  Research  Laboratory,  Aerospace  Medical 
Division. 

Astrand,  P-0.,  4  Rhyming,  I.  (1954).  A  nomogram  for  calculation  of  aercDic 

capacity  from  pulse  rate  during  sub-maximal  work.  Journal  of  Applied 
Physiology .  7_,  21  3-221  . 

Astrand,  P-0.,  4  Rodahl,  K.  (1977).  Textbook  of' Work  Physiology.  (2nd  ed.)  New 
York:  McGraw-Hill. 

Ayoub,  M.  FI.,  Denardo,  J.  D.,  Smith,  J.  L.,  Bethea,  N.  J.,  Lambert,  3.  K., 
Alley,  L.  R.,  4  Duran,  3.  S.  (1932).  Establishing  physical  criteria  for 
assigning  personnel  to  Air  Force  jobs  (Final  Report).  Lubbock,  TX:  Texas 
Tech  University,  Institute  for  Ergonomics  Research. 

Campion,  ”,  A,  (1983).  Personnel  selection  for  physically  demanding  joes: 
Review  and  recommendations.  Personnel  Psychology,  36 ,  527-550. 

Chaffin,  D.  B.  (1975).  Ergonomics  guide  for  the  assessment  of  human  static 
strength.  American  Industrial  Hvoiene  Association  Journal,  36,  505-511. 

Christal,  R.  E.  (1974).  The  United  States  Air  Force  occupational  research 
project  ( AFHRL -TR-73-75 ) .  Lackland  AFB . ,  TX:  Occupational  Research 
Division. 

Collyer,  R.  S.,  4  James,  R.  F.  (1985).  Report  on  pilot  research  to  identify 

and  test  a  methodology  for  the  setting  of  physical  demands  criteria 
for  Army  employments  (Project  Team  Report).  Canberra,  ACT:  Department 
of  Defence,  Directorate  of  Personnel  Plans  -  Army  (Limited  distribution). 

Consolazio,  C.F.  (1971).  Energy  expenditure  studies  in  military  populations 
using  Kofranyi-Michaelis  respirators.  The  American  Journal  of  Clinical 
Nutrition,  24,  1431-1437. 

Fine,  S.,  4  Wiley,  W.  (1971).  An  introduction  to  Functional  Job  Analysis 
(Methods  for  Manpower  Analysis  #4) .  Kalamazoo,  MI:  Upjohn  Institute  for 
Employment  Research. 

Fleishman,  E.  A.,  4  Hogan,  J.  C.  (1978).  A  taxonomic  method  for  assessing  the 
physical  reguirements  of  jobs;  The  Physical  Abilities  Analysis  approach 
(Technical  Report) .  Washington,  DC:  Advanced  Research  Resources 
Organization. 

Goody ,  K .  ( 1 976 ) .  Comprehensive  Occupational  Data  Analysis  Programs  (CODAP): 

Use  of  REXALL  to  identify  divergent  raters  (AFHRL-TR-76-82).  Lackland 
AFB.,  TX:  Occupational  and  Manpower  Research  Division. 

Gott,  S.  P.,  4  Alley,  W.  E.  (1980).  Physical  demands  for  Air  Force 

occupations:  A  task  analysis  approach.  Proceedings  of  the  22nd  Annual 
Conference  of  the  Military  Testing  Association.  Toronto,  Canada:  Canadian 
Forces  Personnel  Applied  Research  Unit. 


16 


9 


eaiy,  W.  Occupational  Analysis  Report:  Vehicle  Mechanic  Trades  ECO  015,  223, 
22S  (.Survey  Report).  v’264).  Oanoerra,  ACT:  Department  of  Cefsnce, 
Directorate  of  Personnel  Plans  —  Ar^v,  Military  Emoicyments  "sssaron  a^c 
Information  Team  (MERIT)  (Limited  distribution). 


i 


»  Hogan 

t 

( 


,  J.  C.,  Ogden,  G.  D.t  Gebhardt,  D.  L.,  &  Flsisnman,  E.  A.  (1379). 
Methods  for  evaluating  the  Physical  and  effort  requirements  of  Daw 
tasKs:  Metaoolic  oerformancg,  and  physical  ability  correlates  of 
oerceiwed  effort  (Contract  Mo.  N0Q1 4-78-C-Q430) .  Wasnington,  DC:  Advanced 
Researcn  Resources  Crganization. 


Jansen,  H.  P.  (1302).  Identification  of  rating  policies  in  Training  Emohasis 
task  factor  data.  Proceedings  of  the  24th  Annual  Conference  of  the 
Military  Testing  Association.  San  Antonio,  TX:  US  Air  Force  Human 
Resources  Laboratory  and  the  US  Air  Force  Occupational  measurement 
Center. 


Lindquist,  E.  F.  (1953).  Design  and  analysis  of  experiments  in  psychology  and 
education.  Boston,  MA:  Houghton-fOifflin. 


•':Icr~ic^,  E.  J.,  Jea^r=ret,  p.  3.,  i  Meehan,  R. 
J-estion^aire  (-urdue  Research  Fcurcation 
Lafayette,  1,4 :  Puroue  Research  Foundation. 


: .  1  SS9 ) .  ^osici. 

Contract  Ac.  u. 


_esc 


Military  Employments  Research  and  Information  Team  (MERIT).  (1902)  Standing 
Operating  Procedures,  Canberra,  ACT:  Directorate  of  Personnel  Plans  — 
Army. 


Myers,  0.  C.,  Gebhardt,  0.  L.,  &  Fleishman,  £.  A.  (1960a).  Development  of 
physical  performance  standards  for  Army  jobs:  The  job  analysis 
methodology  (USARI85S-TR-446 ) .  Alexandria,  V A:  United  States  Army 

Research  Institute  for  the  Behavioural  and  Social  Sciences. 


Myers,  D,  C.,  Gebhardt,  □.  L.,  &  Fleishman,  E.  A.  (1980b).  Physical 
performance  standards  for  Army  jobs:  Criterion  task  manual  ( ARI-80-50) . 
Alexandria,  V A:  United  States  Research  Institute  for  the  Behavioural  and 
Social  Sciences. 


Sharp,  D.  "S.,  Wright,  J.  E.,  Vogel,  J.  A.,  Patten,  J.  F.,  Daniels,  W.  L., 
Knapik,  J.,  &  Koual,  D.  M.  (1980).  Screening  for  physical  capacity  in  the 
US  Army;  An  analysis  of  measures  predictive  of  strength  and  stamina 
(uSARIEM-T-8/80) .  Natick,  MAl  United  States  Army  Research  Institute  of 
Environmental  Medicine. 


S> 


O 


I 


ft 


© 


1 


$ 


«£> 


Vi- 


0» 


Of 


-  17  - 


Aopendix  1  -1 

Physical  Strength  and  Endurance 

Scale  Point  Description  of  Effort 

G  -  Mo  Significant  Physical  Demand  —  Corresponding  requirement  would 
include  periodic  lifting  of  9  lbs  or  less  —  includes  most 
administrative  and  clerical  tasks. 

1  -  Extremely  Light  —  Corresponding  requirement  would  incluoe  periodic 

lifting  of  10  -  19  lbs  to  a  height  of  5  ft  OR  an  equivalent  demand 
for  frequent  or  continuous  muscular  effort. 

2  -  Very  Light  —  Corresponding  requirement  would  include  periodic 

lifting  20  -  29  lbs  to  a  height  of  5  ft  OR  an  equivalent  demand  for 
frequent  or  continuous  muscular  effort. 

3  -  Lignt  —  Corresponding  requirement  would  include  periodic  lifting  of 

30  -  39  ios  to  a  height  of  5  ft  OR  an  equivalent  demand  for  frequent 
or  continuous  muscular  effort, 

4  -  Light  to  Moderate  —  Corresponding  requirement  would  include 

periodic  lifting  of  40  -  49  lbs  to  a  height  of  5  ft  OR  an  equivalent 
demand  for  frequent  or  continuous  muscular  effort. 

5  -  Moderate  —  Corresponding  requirement  would  include  periodic  lifting 

of  50  -  59  lbs  to  a  height  of  5  ft  OR  an  equivalent  demand  for 

frequent  or  continuous  muscular  effort. 

6  -  Moderate  to  Heavy  —  Corresponding  requirement  would  include 

periodic  lifting  of  60  -  69  lbs  to  a  height  of  5  ft  OR  an  equivalent 
demand  for  frequent  or  continuous  muscular  effort. 

7  -  Heavy  —  Corresponding  requirement  would  include  periodic  lifting  of 

70  -  79  lbs  to  a  height  of  5  ft  OR  an  equivalent  demand  for  frequent 
or  continuous  muscular  effort. 

8  -  Very  Heavy  —  Corresponding  requirement  would  include  periodic 

lifting  of  80  -  89  lbs  to  a  height  of  5  ft  OR  an  equivalent  demand 
for  frequent  or  continuous  muscular  effort. 

9  -  Extremely  Heavy  —  Corresponding  requirement  would  include  periodic 

lifting  of  90  lbs  or  more  to  a  height  of  5  ft  OR  an  equivalent 
demand  for  frequent  or  continuous  muscular  effort. 


X  -  No  knowledge  of  task  requirement. 


-h  CJ 


-  18 


Apcenqix  1  -2 

PHYSICAL  STRENGTH  AND  Ei'JDUR Af-.JCE 

This  scale  is  a  measure  of  physical  strength  and  endurance.  Physical 
strength  and  endurance  are  defined  as  involving  significant  use  of  the  °large' 
muscle  groups  in  the  arms,  back  or  legs.  These  would  include  requirements  for 
lifting,  lowering  or  carrying  heavy  or  cumbersome  objects,  pushing  or  pulling, 
quing  or  any  other  demand  for  frequent  or  continuous  exertion  or  muscular 
ort. 

Remember,  make  your  ratings  on  the  basis  of: 

a.  The  most  demanding  aspects  of  each  trade. 

b.  The  level  of  demand  placed  on  a  single  individual  performing  the 
task ,  and 

c.  The  level  of  demand  required  by  the  complete  cask  from  start  to 
finish. 


1  -  MCNE  GR  EXTREMELY  LIGHT  PHYSICAL  DEMAND  -  Corresponding  requirement 

would  include  periodic  lifting  of  5  kilos  (11  lbs)  or  less 
includes  most  clerical  and  administrative  tasks. 

2  -  VERY  LIGHT  -  Corresponding  requirement  would  include  periodic 

lifting  of  5  -  10  kilos  (11  -  22  lbs)  to  a  height  of  1,5  metres  OR 
an  equivalent  demand  for  frequent  or  continuous  muscular  effort. 

3  -  LIGHT  -  Corresponding  requirement  would  include  periodic  lifting  of 

10  -  15  kilos  (22  -  33  lbs)  to  a  height  of  1.5  metres  OR  and 

equivalent  demand  for  frequent  or  continuous  muscular  effort. 

4  -  LIGHT  TO  MODERATE  -  Corresponding  requirement  would  include  periodic 

lifting  of  15  -  20  kilos  (33  -44  lbs)  to  a  height  of  1.5  metres  OR 
an  equivalent  demand  for  frequent  or  continuous  muscular  effort. 

5  -  MODERATE  -  Corresponding  requirement  would  include  periodic  lifting 

of  20  -  25  kilos  (44  -  55  lbs)  to  a  height  of  1.5  metres  OR  an 

equivalent  demand  for  frequent  or  continuous  muscular  effort. 

6  -  MODERATE  TO  HEAVY  -  Corresponding  requirement  would  include  periodic 

lifting  of  25  -  30  kilos  (55  -  66  lbs)  to  a  height  of  1 .5  metres  OR 
an  equivalent  demand  for  frequent  or  continuous  muscular  effort. 

7  -  HEAVY  -  Corresponding  requirement  would  include  periodic  lifting  of 

30  -  35  kilos  (66  -  77  lbs)  to  a  height  of  1.5  metres  OR  an 

equivalent  demand  for  frequent  or  continuous  muscular  effort. 

8  -  VERY  HEAVY  -  Corresponding  requirement  would  include  periodic 

lifting  of  35  -  40  kilos  (77  -  88  lbs)  to  a  height  of  1 .5  metres  OR 
an  equivalent  demand  for  frequent  or  continuous  muscular  effort. 

9  -  EXTREMELY  HEAVY  -  Corresponding  requirement  would  include  periodic 

lifting  of  more  than  40  kilos  (88  lbs)  to  a  height  of  1 .5  metres  OR 
an  equivalent  demand  for  frequent  or  continuous  muscular  effort. 


X  -  No  knowledge  of  task  requirement 


Appenoix  2 


This  is  the  degree  of  physical  exertion  experienced  in  performing  a 
ngle  task  or  a  series  of  tasks. 


Reouires  Extensile 
Physical  Exertion 


Requires  Little 
Physical  Effort 


6 . 


Operate  a  jackhammer 


Perform  iigr.t  welding 


Sit  at  a  desk  using  a  hand 
calculator 


Using  the  7-point  scale,  please  rate  hou  much  Effort  it  takes  to  perform 
each  task. 


•*tv 


.-icpencix  j 


NSEGUENCES  CF  INADEQUATE  PERFORMANC 


i~i5  sca..e  is  a  measure  or  the  seriousness  or  prcbaoie  conseauences  of 
Lnacequate  performance  o*  an  activity.  It  is  defined  in  terms  of 
rrocaoie  injury  gt  death,  failure  to  accomplish  a  critical  mission, 
jasceo  supplies ,  damaged  equipment,  and  wasted  man-hours  of  work. 


EXTREMELY  LGLi  CONSEQUENCES  (neglibie  effect  on  people,  equipment, 
mission ) 


LQlu  CONSEQUENCES 


jEL!_  3ELGW  AVERAGE  CONSEQUENCES 


4  -  SOMEWHAT  BELOW  AVERAGE  CONSEQUENCES 


5  -  AVERAGE  CONSEQUENCES 


6  -  SOMEWHAT  ABOVE  AVERAGE  CONSEQUENCES 


7  -  WELL  ABOVE  AVERAGE  CONSEQUENCES 


a  -  HIGH  CONSEQUENCES 


9  -  EXTREMELY  HIGH  CONSEQUENCES  (may  result  in  injury,  death,  serious 
damage  to  equipment,  or  failure  to  accomplish  a  critical  mission). 


npcendix  a 


EQUEINiCES  GF  INADEQUATE  PERFQRMAMC 


This  scale  is  a  measure  of  the  seriousness  of  orobabie  consequences  of 
inadequate  performance  of  a  job.  It  is  defined  in  terms  possible  injury  or 
death,  wasted  supplies,  damaged  equipment,  and  wasted  man-hours  of  work.  Tne 
work  is  to  be  rated  on  a  scale  from  1  (Least  Serious  Consequences  of 
Iracequats  Performance)  to  7  (.Test  Serious  Consequences  of  Inadequate 
Performance)  with  intermediate  levels  definec  as  follows: 


Chat  will  happen  if  the  job  is  inadequately  performed? 

7.  Most  serious  consequences  (eg  cneck 

parachute  rigging  prior  to  personnel  drop) 


Moderately  serious  consequences  (e.g.,  prepare 
ammunition  for  destruction). 


Least  serious  consequences  (eg  fold  hospital 
linen) . 


Using  the  7-point  scale,  please  rate  wnat  will  happen  if  the  task  is 
inadequately  performed. 


